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We consider the problem of optimal asymptotically faithful compression for ensem- 
bles of mixed quantum states. Although the optimal rate is unknown, we prove upper 
and lower bounds and describe a series of illustrative examples of compression of mixed 
states. We also discuss a classical analogue of the problem. 



^ ! 1 Introduction 

(N 

qq ■ The emergence of potentially useful theoretical protocols for using quantum states in cryp- 

tography and quantum computation has increased the theoretical (and perhaps ultimately 
practical) importance of questions about how quantum states can be compressed, transmit- 
ted across noisy or low-dimensional channels, and recovered, and otherwise manipulated in 
a fashion analogous to classical information. Most of the work done on these matters, begin- 
ning with Q, has focused on the manipulation of pure states, with mixed states appearing 
only in intermediate stages, as the result of noise. An exception is |§, which considered the 
copying or broadcasting of mixed states. When mixed states have appeared as states to 
be transmitted, it has usually been required that their potential entanglement with some 
reference system be preserved, as in Q. This focuses attention again on a pure state, 
the entangled state of system and reference system. As discussed further in Q there is a 
close relation between entanglement transmission and the transmission of pure states of the 
' system itself. 

In the present paper, we consider the compression or transmission of mixed states, 
without any requirement that their entanglement or correlation with other systems be pre- 
served. There might seem to be good reason to confine oneself to pure-state transmission, 
since mixed states, considered apart from any potential entanglement with other systems, 
might not seem particularly useful. This may be why the classical analogue of the problem 
we consider in this paper — the transmission of probability distributions — has not, to our 
knowledge, been previously studied. Game theory is perhaps the first situation that springs 
to mind in which one might wish to produce a mixed state intentionally, given that all pure 
states of which it may be viewed as a mixture are available, since it is well known to game 
theorists that mixed strategies may be better than any of their component pure strategies 
in important situations pL 0]. Thus a "practical" application of mixed-state compression 
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might be the compression of mixed strategies, where the "decoding" is done by the player 
playing the strategy or someone who shares his goals. In cryptographic applications (closely 
related to game theory, of course) and also in probabilistic classical algorithms, there may 
be a use for randomness and an interest in compressing it for efficient storage or transmis- 
sion. Indeed, quantum computation can enable more-efficient-than-classical sampling from 
probability distributions @, |§; there may be relations between these ideas and the work 
reported here. 

The problem of optimal compression for ensembles of pure quantum states has been 
solved [jl], [9|, but for sources of mixed states the minimal resources are unknown. This 



question has also been considered by M. Horodecki in |Tl], 12]. In this paper, we consider 
several variants of the question, depending on the fidelity criteria and encoding/decoding 
procedures used. Sections 2 and 3 present the problem, in variants depending on whether 
or not the encoder /compressor knows the identity of each state and can use it to help 
encode, and depending on whether, in a block-coding setting, a marginal ("local") or total 
("global") fidelity criterion is used; Section 4 considers relations between these variants of 
the problem, in general and for the special case of pure states. Section 5 discusses the 
fact that the entropy of a source ensemble's average density operator provides (as in the 
pure-state case) an upper bound on the rate at which qubits must be used to represent 
the source. We also show that under the global fidelity criterion, if decodings are required 
to be unitary, this is actually the optimal rate. Section 6 formulates a classical version of 
the problem, which we have not seen treated in classical information theory, and discusses 
examples. In Section 7, we show with several examples that in contrast to the pure state 
case, it is possible with general decodings to compress to below the entropy of the average 
density operator. This section also introduces a useful preparation-visible technique, that 
of compression by purifications, which we show does better than our classical methods for 
some of the classical mixed-state compression problems considered in Section 6. Finally, in 
Section 8 we show that the Holevo quantity S(J2iPi&i) — HiPiS{(Ji) for an ensemble gives 
a lower bound on the qubit rate required to represent a source. (A different proof is given 
in ||11||.) We do not know whether this lower bound is attainable in general. 



2 Formulation of the Problem 

In this paper, S(p) will always denote the von Neumann entropy of a density matrix p and 
H(pi, ...,p n ) will denote the Shannon entropy of a probability distribution pi,...,p n . In 
both cases logarithms are taken to base 2: 

S{p) := -tr (p log 2 p) 
H(pi, ...,p n ) := -J^Pi l °S2 Pi 

i 

Let pi, . . . , p n be a list of (possibly mixed) d-dimensional quantum states. Each state 
is assigned a prior probability p\, . . . ,p n respectively. We refer to such a list as a source or 
ensemble of signal states, denoted by E = {pi,Pi}- Alice is fed an unending sequence of 
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these signal states, with each successive state chosen randomly and independently from E. 
At time N she will have the total state = p% x <8> • • • <8> p% N with probability VixVii ' ' 'Pi N - 
Alice wants to perform either of the following tasks (which are equivalent for our con- 
siderations): 

• Communication: Alice wants to send the signals to Bob using a minimum number 
of qubits/signal so that Bob can reconstruct long sequences with "arbitrarily high 
fidelity". This involves a "coding procedure" for Alice and a "decoding procedure" 
for Bob (cf. later discussion for the precise meaning of all these terms). 

• Storage: Alternatively, Alice wants to store the signals as efficiently as possible. In 
this interpretation the coding procedure is used for putting signals into storage, and 
the decoding procedure for reconstituting them. 

We distinguish two fundamental situations for Alice: 

• Preparation-blind (blind): Alice is not given the identity of the individual (generally 
nonorthogonal) signal states (she knows only their prior distribution). 

• Preparation- visible (visible): Alice is given the identity of the individual signal states 
(as well as their prior distribution). Indeed, in this case we may assume that she is 
simply provided with a sequence of the names of the states and she may prepare the 
states herself if she wishes. 

Note that in the blind case, Alice is being fed essentially quantum information, whereas 
in the visible case she is getting entirely classical information. In both cases, however, Bob 
on decoding is not required to identify the actual signal states, but only to produce high 
fidelity representatives of the correct sequence of states. Hence, even in the visible case, 
the problem is not one of classical coding/information theory. The visible case (for pure 
states) occurs, for example, in quantum cryptographic protocols (e.g. BB84 ]l3[ and B92 
[l4fl), where the sender (Alice) is also the state preparer. 

3 Coding/Decoding Schemes and their Fidelity 

Let Tid denote the space of all d-dimensional states. Given any physical system in state p, 
quantum mechanics allows only the following three types of operations: 

• (OP1) A unitary transformation, p — > UpU^ (U unitary). 

• (OP2) Inclusion of an ancilla in a standard state po (independent of p), p — >• p (g> p$. 

• (OP3) Discarding a subsystem (when p is a state of a composite system AB), pab — ¥ 
tr B (pAB)- 
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Note that (OP2) and (OP3) change the value of d. 

Consider any length- N string of input states (given either visible or blind): 

°N = Pi, ® • • • <g> p lN E Hf N , (1) 

Prob(crjv) = p h ■■■Pi N ■ (2) 

A coding/decoding scheme, using q qubits/signal, is defined by the following requirements, 
which are to be specified for all sufficiently large N. 



• Blind coding: Alice's coding procedure, if blind, is any specified sequence of the above 
three operations applied to a n, giving a final state ujn within the required resources 
of q qubits/signal (i.e., in 2 qN dimensions). Mathematically, any such sequence of 
operations corresponds to a completely positive, trace-preserving map C on the density 
operator, i.e., ujn = C(pi 1 (8) • • • (g> Pi N ), and any completely positive, trace-preserving 
map corresponds to such a sequence of operations. 

• Visible coding: Alice's coding procedure, if visible, corresponds to an arbitrary as- 
signment of a state lon E TC 2q N to each a^; i.e., Alice can build any state she pleases 
as the coded version of the input string. 

• Finally, Bob's decoding (analogous to blind coding) is any sequence of the above three 
operations applied to the coded state, yielding a state ctjv of N (i-dimensional systems. 
Thus decoding is a completely positive, trace-preserving map from 7i 2 i N to TC d N . 

Let us write Bob's decoded state, produced by coding followed by decoding of &n = 
(8> ■ ■ ■ <8> pi N , as a N = a il ... iN . Let 

= / ^ce °f a H ,..., N over all \ k = 1 ^ (3) 
I signal spaces except the k-th I 

be the reduced state in the k th signal position after coding and decoding; i.e., pk is the 
decoded version of the k th transmitted state pi k . Let 

F{p 1 ,p 2 ) = [t^ C e{p\ /2 p 2 p\ /2 ) 1 / 2 ) 2 (4) 

denote the Bures-Uhlmann fidelity function 1€, 17]. The coding/decoding scheme has 
fidelity 1 — e if it satisfies the following fidelity requirement: There is an Nq such that for 
all N > N , 

N 

^Prob^iv) J] F( Plk ,p k ) > 1 - e (LOCAL-FID), (5) 

o\zv k=l 

Note that high fidelity according to (LOCAL-FID) allows entanglement to be introduced 
between output signal states, even though there was no entanglement in the input ([]]), 
which was taken to be a product state. This is because we examine only through its 
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partial traces @, thus reducing the state to each position separately. In view of this we 
might consider a stronger fidelity criterion (GLOBAL-FID), which replaces (LOCAL-FID) 
by 

Prob(a N ) F{a N , a N ) > 1 - e (GLOBAL-FID). (6) 

<j n 

For e tending to zero, this eliminates extraneous entanglements in the output sequence. Note 
that in a continuously varying situation with e tending to zero, (GLOBAL-FID) implies 
(LOCAL-FID) because (GLOBAL-FID) will require that &n become arbitrarily close to 
<7iv and, hence, F(pi k ,pk) — > 1 for each k, too. 

Example 1. Alice wants to send Bob o"2 = \l ® ^1, so we may take the decoded state to 
be d 2 = \I®\I satisfying (LOCAL-FID) and (GLOBAL-FID) with e = or 

°2 = ~(|0) ® |0) + |1) ® |1>) (<0| ® <0| + <1| (1|) , 

satisfying (LOCAL-FID) with e = but not (GLOBAL-FID). ■ 

We will generally adopt the fidelity requirement (GLOBAL-FID) in the following. If it 
is important that the signal states remain uncorrelated, (GLOBAL-FID) is the appropriate 
criterion; otherwise it may be too strong. 

Remark 2. (LOCAL-FID) has the following awkward feature: If we have a very high 
fidelity coding/decoding scheme according to (LOCAL-FID) and we repeatedly apply it to 
a long string, ajy — ► ajy — > ctjv, then we will not necessarily preserve high fidelity in the 
sequence of reduced states. This is because though <tat and ajy have essentially the same 
reduced states at each position, globally they can be very different states (cf. Example 1). 
Since the coding scheme is generally a WocA;-coding scheme, it uses the global input state 
and will work well only if this global state is a product state as in (|). Hence <7at will not 
generally have the correct reduced states. ■ 

From the above precise formulations of the notions of coding, decoding, and fidelity, we 
obtain a well defined mathematical problem. 

Problem 3. For a given source E, find the greatest lower bound q m i n of all q 's with 
the following property: For all e > there exists a coding /decoding scheme based on q 
qubits / signal with fidelity 1 — e. 

This problem may be considered either in the blind or visible context, with the variation 
over encodings taken over the appropriate class of maps in each case. Similarly, it may be 
considered in the case of either of the fidelity criteria, (LOCAL-FID) or (GLOBAL-FID). 
We will say that the source E can be coded (or compressed) at the rate q m \ n . 

Equivalently, the problem may be stated as follows: For a given source E, find g m i n with 
the following property. Given any 5 > 0, (a) if q m i n + o~ qubits/signal are available, then for 
every e > there exists a coding scheme with fidelity 1 — e, and (b) if q m i n — 5 qubits/signal 
are available, then there exists an e > such that every coding scheme will have fidelity 
less than 1 — e. 
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4 Comparing the Formulation with Schumacher's 
Coding for Pure States 



The problem formulated above is intended to be a generalization of the scenario in Schu- 
macher's theorem [||, [l8| to the case of mixed input states. Indeed, if the input states 
happen all to be pure states, then the above formulation reduces precisely to the situation 
of Schumacher's theorem. It is interesting to note that several of the distinctions made 
above collapse in the special case of pure input states. 

Proposition 4. If the input states are all pure, then there is no distinction between the 
blind and visible problems. 



Proof: In Refs. [18, [l0| an optimal coding/decoding scheme for the visible pure-state prob- 
lem is described. This optimal scheme turns out, remarkably, to be blind; i.e., knowledge 
of the identities of the individual input signals gives Alice no further benefit in the case of 
pure states. ■ 

In |l(J it is also shown that nonunitary decoding operations are of no advantage (on the 
criterion (GLOBAL-FID)) in decoding for the pure-state problem. In contrast, for mixed- 
state signals, nonunitary decodings are generally essential for optimal compression. This 
follows from Theorem 7 and §7 below. 

Finally, the distinction between (LOCAL-FID) and (GLOBAL-FID) also collapses for 
pure signal states. 

Proposition 5. If the input states are all pure, then the two alternative fidelity criteria, 
(LOCAL-FID) and (GLOBAL-FID), become equivalent as e is allowed to tend to zero. 

Idea of proof: We already know that the (GLOBAL-FID) criterion implies the (LOCAL- 
FID) criterion. Suppose that the (LOCAL-FID) criterion holds for a sequence of e values 
tending to zero. (Here we are thinking of a sequence of coding/decoding schemes which all 
operate within the resource constraint of q qubits/signal where q > q m m-) Then the reduced 
states pk of on become arbitrarily close to the input states pi h which are pure. Hence &n 
cannot be much entangled since entanglement always shows up as impurity in the reduced 
states pk- Thus must approach the product state ajy and (GLOBAL-FID) holds. ■ 

As a consequence of Proposition 5, the awkward feature of (LOCAL-FID) described in 
Remark 2 does not arise in the coding of pure states. 

5 The S(p) Upper Bound for q m [ n 

Let p be the average density matrix of the input states: 

n 

P = ^PiPi- (7) 

i=l 

Proposition 6. S{p) is an upper bound for g m ; n under the criterion (GLOBAL-FID) (and 
hence also under the criterion (LOCAL-FID)). 
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Proof: For each pi, choose a representative ensemble of pure states corresponding to pi, so 
that we may view Alice as receiving an overall ensemble of pure states with density matrix 
p. By Schumacher's theorem this may be transmitted to Bob with arbitrarily high fidelity 
by compressing to S(p) qubits /signal. ■ 

Note that this compression preserves too much internal structure: Bob faithfully recon- 
structs Alice's chosen ensembles of pure states underlying the /?j's rather than just the p^'s 
themselves. For our purposes it is sufficient for Bob to decode to any other representative 
ensemble for the p^s. Hence we would expect that further compression is possible, and the 
examples in §7 below show that it generally is. Furthermore, the coding in Proposition 
6 gives high fidelity relative to the stronger criterion (GLOBAL-FID); using the weaker 
(LOCAL-FID), one might expect even more compression. 

In fact we can say more, embodied in the following theorem. 

Theorem 7 Q Q. For the stronger fidelity criterion (GLOBAL-FID), if the decoding 
operation is required to be unitary (i.e., using only OP1 and OP2), then no further com- 
pression is possible, i.e., q m i n = S(p). 
The proof is given in Appendix A. 

Note that for the pure-state coding theorem, the decoding may indeed be taken to be 
unitary and (GLOBAL-FID) is used (being equivalent to (LOCAL-FID) by Proposition 5), 
but we do not necessarily wish to impose these conditions in the mixed-state case. 



6 A Classical Analogue 

In the case of Schumacher's pure-state coding theorem, there is a clear classical analogue, 
which has been well studied and completely solved, namely Shannon's noiseless coding the- 
orem. Though the classical analogue for the case of mixed states appears not to have been 
studied, it would involve the compression/communication of probability distributions. To 
formulate the classical problem, let there be a finite number of possible classical states, i.e., 
distinguishable alternatives (this is the analogue of our assumption of finite-dimensional 
Hilbert spaces), and identify the input and output classical states with particular orthonor- 
mal bases in input and output Hilbert spaces. Write probability weight functions on the 
sets of orthonormal pure states as column vectors p = (pi, . . . ,p n ) T of probabilities. These 
classical probability distributions then correspond to commuting density operators diagonal 
in the input and output bases. 

We may formulate classical preparation-blind coding or decoding procedures as multipli- 
cation of input probability vectors by a stochastic matrix A (one with nonnegative entries 
whose columns sum to one): 

Pout = Ap in . (8) 

The stochasticity ensures that the matrix can be interpreted as a matrix of transition prob- 
abilities. As in the quantum case, preparation-visible procedures are described by arbitary 
maps between the relevant spaces, in this case between the spaces of probability vectors. 
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The stochastic linear maps on the probability distributions correspond to a (convex) subset 
of the trace-preserving completely positive maps on density operators, and a given classi- 
cal problem maps onto a corresponding quantum problem of sending commuting density 
operators. If we allow all possible trace-preserving completely positive maps, instead of 
just those which correspond to classical dynamics in the diagonalizing bases, we are using 
quantum means to deal with a classical problem, and we can compare the power of these 
quantum means to that of the purely classical means defined by restricting the allowable 
CP-maps to those that act as stochastic matrix multiplication in the given bases. 

These notions and comparisons are illustrated in the following examples, which are 
phrased in terms of the quantum language, i.e., viewing classical distributions as commuting 
mixed states. 

Example 8. We have two input states, p\,p2 € 71-2, 

pi <->■ diag(ai, 1 - a{) , p 2 <-> diag(a 2 , 1 - a 2 ) , (9) 

which are simultaneously diagonal in a basis known to Alice and Bob. Let the prior prob- 
abilities for these two states be p\ and p 2 . Classically we may regard the two states as 
suitably biased coins, C\ and C 2 . A preparer chooses a sequence of coins, C\ or C 2 with 
probabilities p\ and p 2 , and tosses each of them once. The sequence of outcomes is passed 
on to Alice. Since Alice can look at the sequence of outcomes, we can regard the sequence of 
outcomes as the realization of "Alice being given an unknown sequence of the two states." 
Notice that in the blind case, Alice cannot be given the actual coins that make up the 
input sequence, for she could then toss each one many times and identify the coins in the 
sequence, which is impossible to do given a single instance of each quantum state in the 
sequence. In contrast, in the visible case, Alice is given the sequence of coin names (or the 
actual coins, from which she could generate the sequence of coin names), together with a 
sequence of outcomes. In both cases, the objective of the protocol is to have Bob generate a 
sequence of outcomes that are governed by the same probabilities as Alice's input sequence 
of outcomes. Thus we have the following classical problems. 

Blind case: A preparer chooses a sequence of coins, C\ or C 2 with prior probabilities 
pi and p 2 , tosses each of them a single time, and passes the sequence of outcomes on to 
Alice. Alice "codes" her sequence of outcomes, and Bob "decodes" the result, obtaining 
an output sequence of outcomes. The coding/decoding processes may involve probabilistic 
processes. As before, Alice would like to compress the input sequence as much as possible 
for transmission. A perfect coding/decoding scheme would achieve the following: Suppose 
that in position 1 the preparer has used coin C 2 ; then, taking into account the probability 
of outcomes in tossing C 2 and all probabilistic processes involved in coding/decoding, the 
first entry in Bob's outcome sequence should have a probability distribution which is the 
same as for coin C 2 . A similar condition should apply at each position of the sequence. 

This condition requires perfect fidelity of transmission of the distributions. In order to 
allow the usual situation of fidelity that approaches perfection only in an asymptotic limit 
of longer and longer block coding, we introduce a fidelity function for classical probability 
distributions. If p = (p±, . . . ,p n ) T and q = (q±, . . . , q n } T are two probability distributions 
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on the same space, then the fidelity is defined by 

F d (p,q)= {YWmV^j , (10) 

which is also known as the Bhattacharyya-Wootters distance (or overlap) between the dis- 
tributions. Notice that -F c i(p> q) = 1 iff p = q. The classical fidelity F c \ may be viewed as a 
special case of the Bures-Uhlmann fidelity (||), i.e., F c \(p, q) = F(pi, P2) for two commuting 
density operators, p\ and p%, that have p and q on their diagonals. 

The problem is then to find the minimum number of bits/signal which suffices to code 
the input string with asymptotically arbitrarily high fidelity. (A precise formulation is very 
similar to that given for the quantum problem in §2.) There is an obvious upper bound 
on the minimum number of bits/signal: Alice may compress her outcome sequence to the 
Shannon entropy of the average coin, H(a, 1 — a) = S(p) bits/signal, where a = p±a\+p2a2 
is the average probability for the first outcome; Bob can decode the compressed sequence 
to produce an output outcome sequence that has asymptotically perfect fidelity. Because 
we are dealing here with commuting density operators, this upper bound is the same as the 
S(p) upper bound of §5. 

Visible case: In this case Alice is fed the sequence of actual coin names, C\ or C2, in 
addition to outcomes of tossing each coin once. The blind-case upper bound of H(a, 1— a) = 
S(p) bits/signal applies also to the visible case, but there is an additional clear upper bound 
in the visible case: Alice may simply send Bob the full information of which coin to use at 
each stage; she can compress this data by the Shannon entropy of the prior distribution of 
coin choices, i.e., H(p\,p2) bits/signal. 

Although we do not know the optimal number of bits/signal for this problem, we now 
describe a purely classical coding/decoding scheme which beats both bounds for some values 
of the parameters pi, P2, «i, and 02- 

Example 9. Suppose that a.% > ac\. Denote the coin toss outcomes by H and T, with H 
having probability Qj for coin C{. Alice sends one of three possible messages, Mq, Mi, or 
M2, to Bob according to the following (probabilistic) coding scheme: 

• Regardless of the input coin (Ci or C2), Alice sends Mq with probability 1 — 02 + ot\. 

• If the message Mq is not chosen (i.e. with probability «2 — ct\), Alice sends Mi if the 
coin is C\ and M2 if the coin is C2. 

Bob responds to these signals as follows: 

• For Mq Bob probabilistically generates H or T with prob(H) = a\/(l — 02 + ot-i) and 
prob(T) = (1 - «2)/(l - «2 + ol\). 

• For Mi Bob generates T with probability 1. 

• For M2 Bob generates H with probability 1. 
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Curiously, in the latter two cases Bob actually learns the identity of the coin yet he responds 
with a different distribution! It is readily verified that for each position in the sequence, tak- 
ing into account the probabilistic choices in coding/decoding, Bob's output result correctly 
represents the result of one toss of the corresponding input coin. 

The messages Mq, Mi, and M2 are sent with probabilities 1 — ai + a±, piio.2 — a±), and 
^2(02 — oil), so Alice can compress the sequence to 

H = H(l — 012 + a>i,pi(a2 — oci),p2((X2 — bits/coin toss. 

If pi = P2 = \ and ai ss a 2 ~ \, then H(pi,p 2 ) = 1 and S(p) = H(a, 1 — a) ss H(\, \), 
whereas E ss 0, thus beating the two bounds. (For some other values of the parameters S 
exceeds both bounds). 

It has been conjectured that the minimum number of bits/signal for this classical prob- 
lem (and its natural generalization too many classical distributions) should be the mutual 
information H(a, \—a)—piH{a.i, l — ai)—p2H(a2, 1 — 02), even if global fidelity is required, 
but this has resisted proof/disproof so far. (This would coincide with the lower bound given 
in §8.) In Example 12 below we will describe a quantum protocol for this problem which is 
better than all the above protocols. 



7 Examples of Compression beyond S(p) 

We now return to our main question of quantum coding for general sources of mixed states. 
Though the problem of the optimal value of q m i n remains unsolved, we describe here a series 
of interesting examples of compression beyond the S(p) upper bound given in §5. These 
examples reveal something of the intricacy of this problem. (Notice that Example 9 already 
provides a case of compression beyond the S(p) bound in the classical context.) In the next 
section we will derive a lower bound for q m i n . 

Example 10 (Trivial cases). The following two situations are blind, but Alice may reliably 
identify the input states, thus making them visible. 

(a) Suppose that there is only one possible input signal p = \l so p = ^1 and S(p) = 
1 qubit/signal. Yet Alice need not send anything at all; i.e., we may compress to 
qubits /signal. 

(b) Suppose that the input signals pi with prior probabilities pi are supported on orthog- 
onal subspaces. (The support of a mixed state is defined as the subspace spanned by all 
eigenvectors belonging to nonzero eigenvalues.) Thus Alice may reliably measure the iden- 
tity of the inputs and compress the resulting data to H(pi, . . . ,p n ) qubits/signal. Now for 
orthogonally supported states we have generally 

S (P) = H (Pli ■■■■.Pn) + ^2PiS(pi) > H(pi, ...,p n ). (11) 

■ 

Example 11 (A nontrivial blind example with noncommuting mixed input states). There 
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are two signal states, p\ and p 2 , in m + n dimensions with prior probabilities pi and p 2 . 
The states have a block-diagonal form, 

pi = diag(e<7i, (1 - e)n) , p 2 = diag(ecr 2 , (1 - e)r 2 J , 

where n and cr 2 are density matrices of size m x m and n and r 2 are density matrices of 
size n x n. Writing 

p = ViPl + V2P2 , o = pxa-L + p 2 (J2 , f = pin + p 2 T2 , 
ones easily sees that 

S{p) = H(e, 1 - e) + eS(a) + (1 - e)S(f) . (12) 

In the S(p) coding scheme of Proposition 6, we may interpret this formula as follows. For 
a sequence of inputs Alice first measures the cr-space versus the r-space — projecting the 
input state into whichever space is the outcome — and she compresses the resulting string of 
subspace names to H(e, 1 — e) bits/name. If the outcome space was "cr-subspace," a result 
that occurs a fraction e of the time, she compresses the post-measurement state to S(a) 
qubits /signal, and similarly if the outcome was "r-subspace," which occurs (1 — e) of the 
time, she compresses to S(f) qubits/signal. Thus the total sending resources is the sum of 
these three terms in fll2]). 

Now suppose that a% / cr 2 , but that n = ^2 = t. Then, in the case that Alice's 
measurement outcome is "r-subspace," a result that also becomes known to Bob through 
the communication of the subspace names, she need not send the post-measurement state 
at all, as Bob already knows it (i.e. r) and can construct it himself. Thus we may drop 
the last term in ( |T2| ) and communicate the mixed states (with perfect fidelity) using only 
H(e, 1 — e) + eS(a) qubits/signal, which is less than S(p) by an amount (1 — e)S(f). ■ 

Example 12 (Visible coding by purification of the input states). The general idea here 
(cf. also is that in the visible situation, Alice may build purifications of the input mixed 
states and send these purifications (which are pure states) to Bob utilizing the compression 
of Schumacher's pure state coding theorem. On reception Bob regains the mixed states by 
selecting a suitable subsystem of each decoded purification state. 

As a first example, consider a special case of states of the form in Example 8. There are 
two possible input states, 

pi = diag(e, 1 - e) , p 2 = diag(l - e, e) , (13) 

with equal prior probabilities pi = p 2 = §• Hence S(p) = 1 and iJ(pi,p 2 ) = 1. After 
constructing purifications and IV^); Alice's task is to send a 50/50 mixture of 
and 1^2) • Thus to get the greatest benefit from Schumacher compression, the purifications 
should be chosen so that their ensemble has least von Neumann entropy; i.e., the two 
purifications should be as parallel as possible. According to Bures and Uhlmann's basic 
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theorem |T^, 16, 17], the minimum possible angle 9 m i n between purifications of p\ and p2 is 
given by 

cos 2 6» min = F{p 1 ,p 2 ) . 
Moreover, a 50/50 mixture of states at angle 9 m i n has entropy 

COS mm 1 COS 0min 

bmin = H \ 2 ' 2 

which gives the Schumacher limit of qubits/signal in compression by this method. 
For the states in ((l^) we readily compute 

F( Pl ,p 2 )=4e(l-e) , 

so that Alice may compress the purification ensemble to 

1 



T(e) = Hi- + y/e(l yf e{l - e)J qubits/signal , 

which is better than S(p) or H(pi,p2), being equal to these only when e is or 1. 

Note that the purely classical compression method of Example 9 applies to this case, 
too. The relevant parameters are p\ = p2 = |, a\ = e, a 2 = 1 — e, and < e < \. The 
method of Example 9 gives compression to 

E(e) = H^2e, i - e, i - ej = H(2e, 1 - 2e) + 1 - 2e qubits/signal , 

and we get 

3(e) >T(e) for 0<e<i, 

with equality only for e is or 1/2. Thus, whenever the states p\ and p2 of Eq. ( |l~3| ) are 
mixed, the quantum purification compression beats the classical method of Example 9. ■ 

Remark 13 (A simple construction of optimally parallel purifications for commuting 
states). Given any mixed state in diagonal form p = diag(pi, . . . ,p n ), we may immedi- 
ately write down a canonical purification: 



1=1 



where {|efc)} is the diagonalizing basis for p. Given two such states, 

pi = diag(pi, . . . ,p n ) , p 2 = diag(gi, . . . , q n ) , 
the canonical purifications clearly satisfy 



2 

-.2/ 



(tf>i\ -02) | 2 = I ^ \fPi\fh~i I = F(.PliP%) = (Bures-Uhlmann limit for cos 2 



\i=l 
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Thus for simultaneously diagonal states, the canonical purifications are always optimally 
parallel. 

Notice that the diagonal entries of p\ and P2 are classical distributions p and q and that 

\{H^)\ 2 = F(p 1 ,p 2 ) = F cl (p,q) • 

The construction of optimally parallel purifications converts the Bhattacharyya-Wootters 
overlap of classical distributions into quantum overlap of pure quantum states. In this way 
the methods of quantum coding may be applied to problems of compression of classical 
probability distributions. 

Suppose now that we have two or more simultaneously diagonal states, 

/ 9( a )=diag(pS a) ,...,^)), a = l,...,K. 

Then their canonical purifications ip^ a ^ have the remarkable property that they are all 
simultaneously pairwise maximally parallel. Recall that Uhlmann's theorem gives a limit 
on how parallel purifications can get for any pair of mixed states. It does not follow that this 
optimal parallelness can be simultaneously achieved by purifications of three or more states. 
Yet for simultaneously diagonal states, this optimal simultaneous parallelism is achieved by 
the canonical purifications. 

It seems unlikely, however, that maximum parallelness gives the best set of purifications 
for the purpose of mixed-state compression when there are three or more signal states. 
Jozsa and Schlienz [^] have shown the existence of pairs of pure-state ensembles {pi, \ipi)} 
and {pi, \ xi}} f° r which all homologous pairs in the second ensemble are less parallel (i.e., 
Vi, j \{Xi\Xj}\ > l(V'ilV'i)l)) but for which the entropy of the second ensemble is nevertheless 
smaller. This phenomenon is expected to persist under the added constraint that the states 
involved are purifications of the given mixed states. ■ 

Remark 14. If Alice sends Bob the canonical purification of p, 



she is actually supplying him with two copies of p - one for each of the two subsystems 
of the purification. Therefore one suspects that this compression is not optimal, at least 
when the criterion (LOCAL-FID) is used. To benefit from this observation, we might try to 
construct purifications each of which codes two signal states, one in each subsystem of the 
purification. To do this, the two signal states must have the same eigenvalues, but they need 
not be identical (e.g., as occurs in Example 15 below). Thus the signal states would purify 
each other in pairs at the expense of introducing strong entanglement in the output signal 
sequence. This construction would have high (LOCAL-FID) fidelity, but low fidelity for the 
(GLOBAL-FID) criterion. Of course, even with the stronger criterion (GLOBAL-FID), it 
is not clear that the compression of Example 12 is optimal. 

Example 15. (The "photographic negative" example, another application of compression 
by purification). Suppose that we have d possible input signals pi, where pi is the d x d 
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diagonal density matrix with equal entries l/(d — 1) along the diagonal except for the ith 
entry which is zero: 

Pi = - - ^ diag(l, 1, . . . , 1, 0, 1, . . . , 1) , where is in the ith place. 

The signals all have equal prior probabilities pi = l/d, giving p = I/d. 

The canonical purifications in % 7id all lie in a d-dimensional subspace spanned by 
{\ e i) ® where {|ej)} is the diagonalizing basis of the p^'s. A direct calculation shows 
that the equally weighted mixture of purifications in this d-dimensional subspace is a density 
matrix 

d 2 . . . . 1 / 

where I is the identity matrix in the d-dimensional subspace, and 

1 d 

IV>> = -75 ^h) ® l e i) 

is a maximally entangled state. Thus p can be viewed as a mixture of a totally mixed 
state I/d, with probability l/(d — 1), and a maximally entangled pure state \ip) (ip\, with 
probability (d — 2)/{d — 1). Changing to a basis in which \tp) is the first basis vector, we 
can easily determine the eigenvalues of p to be a nondegenerate eigenvalue (d — l)/d and 
(d — 1) degenerate eigenvalues l/d(d — 1). A short calculation gives 

q = S(p) = 2) + 5 bg(d " ^ = 5 bg(d ~ V ~ 1 ° g ( 1 " l) q ubits / si § nal 

for the compression scheme. Note that q — > as d — > 00. 
Introducing the Holevo quantity for the ensemble I? = 

X (E) = S(p) - ^ Pi S( Pi ) = -log(l - 1 

we find 

2 

9 = X + 2 log(d - 1) , 

so 9 ~ ^ X as d — > 00. Note that although we described this construction in terms of block 
coding the ensemble of canonical purifications for all the signals, it also provides canonical 
purifications for the ensemble of ./V-block mixed states. Nonetheless, for finite d, the above 
bound remains greater than the Holevo bound. Thus, if the conjecture that the Holevo 
bound is achievable by visible compression is correct, then, perhaps surprisingly, canonical 
purification is a suboptimal method of compression. ■ 

8 A lower bound on the rate of mixed-state compression 

There is a simple argument that the Holevo quantity for an ensemble E = {pi, pi} of mixed 
states is a lower bound on the rate at which such an ensemble can be coded. Here we use 
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the global fidelity criterion (GLOBAL-FID), and encoding may be blind or visible. This 
argument uses the result, shown for pure-state ensembles by Hausladen, Jozsa, Schumacher, 
Westmoreland, and Wootters pi]] and for general mixed-state ensembles by Holevo [^] and 
by Schumacher and Westmoreland |B3], that the Holevo quantity for an ensemble E is 
the capacity for classical information transmission using the states in the ensemble E as 
an alphabet. The gist of the argument is that if an ensemble of mixed states could be 
coded at a rate lower than its Holevo quantity, even with preparation-visible encoding, then 
one could code a Holevo quantity's worth of classical information into those mixed states, 
compress them to an ensemble on a channel space of size smaller than the Holevo quantity 
(per use), recover the original ensemble with high fidelity, and therefore recover the classical 
information. But since the classical information capacity of an ensemble of states cannot 
be larger than the log of the dimension of its Hilbert space (since this is greater than or 
equal to x f° r an y ensemble), this is impossible. 

To formalize this argument, consider an ensemble (or source) E = {pi, Pi} of mixed 
states pi with probabilities pi on a Hilbert space of dimension d. The Holevo quantity 
for this ensemble is 

X (E) = s(^p lP ^ -J2PiS(Pi) ■ (14) 

A sequence of N signals from this source gives a state drawn from the ensemble 

E® N = {p h p i2 ■■■Pi N ,Pi 1 ® Pi 2 <8> • • • <8> Pi N } • (15) 

We introduce the notation f(A) = {pi, f(pi)} for the ensemble obtained by applying a map 
/ to the states of the ensemble A, and we write B(TC) for the space of bounded operators 
on a Hilbert space TL. 

For two ensembles with the same probabilities, A = {pi,pt} and B = {pi,ai}, we define 
an average fidelity by 

F(A,B) = J2mF(p l ,<Ji) ■ (16) 

i 

In proving the main theorem of this section, we will need a lemma that bounds the absolute 
value of the difference in the Holevo quantities for two ensembles in terms of their average 
fidelity, provided the average fidelity is high enough. 

Lemma 16. If F(A, B) > v / 35736, then 

\ x (A)-x(B)\ < (2 + 2^2)^1- F(A,B)logd + l, (17) 

where d is the dimension of the state space of A and B. 
The proof is given in Appendix |B|. 

Our formulation of the mixed-state compression problem for the fidelity criterion (GLOBAL- 
FID) can now be stated succinctly. Relative to (GLOBAL-FID), the source E can be coded 
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(or compressed) at a rate q if there exists a channel Hilbert space C with q = log(dimC) 
and encoding/decoding schemes {e^ N \ T>( N '}, 

e (N) . B( H ®Nj ^ B {C m ) , PW : £(C^) - B(fff) , (18) 

such that 

lim F(E m ,FW) = 1 (GLOBAL-FID) , (19) 

N—*oo 

where 

F (N)= v (N) Qe (N)( E ®Nj (2Q) 

is the ensemble after decoding. We require that the encodings take density operators 
to density operators and that the decodings 2)^ be trace-preserving completely positive 
linear maps. Permitting the encodings to be arbitrary maps on density operators allows 
for preparation-visible encoding; if is a trace-preserving completely positive linear map 
£( N ) , then the compression is preparation-blind. 

The argument outlined at the beginning of this section can be formalized in the following 
theorem. 

Theorem 17. For the fidelity criterion (GLOBAL-FID) and for both blind and visible 
encodings, the Holevo quantity x(-B) for an ensemble E = {pi,Pi} is a lower bound for (/mm* 
Proof: Suppose that the ensemble E = {pi,Pi} can be compressed at a rate q < x(E) with 
asymptotically high fidelity (Eq. (|i~9|)), whether preparation-blind or preparation- visible. 
Consider the ensemble of channel states (density matrices) 

W {N) = {pi x Vi 2 - ■ ■Pi N ,w ili2 ...i N } , (21) 

where 

Whi 2 ...i n = e ^ N \Ph ® Pi 2 ® • • • ® Pi N ) 

is the encoded state corresponding to the unencoded source state pi x (g> pi 2 <g> • • • pi N . The 
Holevo quantity for satisfies 

where Nq is the the log of the dimension of the channel Hilbert space for N blocks of 
channel. 

Consider now the following procedure for using the A-block operators u^...^ as an 
alphabet to send classical information. Make codewords out of strings of M of these op- 
erators. Prune them as one would if one were coding using the operators Pi t ...i N of the 
ensemble in the Holevo/ Schumacher/ Westmoreland procedure for attaining x{F ) 

as classical capacity. As the first step in the decoding procedure, convert them using the 
decoding {X>^ N )^ M into strings of the operators pi u ..i N of the ensemble F^ N \ Then apply 
the decoding measurement appropriate to that ensemble. 
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This procedure clearly uses the A^-block ensemble W^ N ' to transmit classical information 
at the rate x(F ) P er N blocks. But by assumption [cf. fll9|)1, the ensemble F^ N > has, 
at large enough N, arbitrarily high fidelity to the original ensemble E® . Hence, applying 
Lemma 16 to the ensembles E® N and F^ N \ whose states lie in d^-dimensional Hilbert 
spaces, one finds 



< (2 + y/2)y/l-F(E® N ,FV r >)] IO gd+— . (23) 



Thus for large enough N, x(F^ )/N is arbitrarily close to x(-E), which is greater than 
X (W^)/N by at least an amount x(E) ~ 9; independent of N. So for large enough N, 
x(F^) exceeds x(W^), contradicting the fact that the classical capacity of the ensemble 
WW is x{W {N) ). We conclude that the compression rate q must satisfy q > x(-E). ■ 

M. Horodecki [jllj has independently derived the lower bound of Theorem 17, using the 
nonincrease of the Holevo quantity under completely positive maps. This nonincrease is 
an easy consequence of the monotonicity of relative entropy under such maps [24, 25], and 



therefore of Lieb's fundamental concavity theorem [Pq]). (A good treatment of all of these 
is to be found in [p?!].) 

A special case of Theorem 17 is the lower bound of S(p) qubits per source signal on 
the rate of compression of ensembles of pure states. This lower bound was established 
for preparation-blind encodings and unitary decodings in [|]]; for arbitrary (preparation- 
blind or preparation- visible) encodings and unitary decodings in and, by somewhat 
technical arguments, for arbitrary encodings and decodings using completely positive trace- 
preserving maps in Q. The present result allows for arbitrary encodings and decodings 
using completely positive trace-preserving maps, so it provides an alternative and perhaps 
more satisfying derivation of the most general form of the pure-state lower bound. 

The lower bound in Theorem 17 raises the fundamental open question of whether the 
bound is achievable (with global fidelity) with either blind or visible encoding. If not, 
one would like an expression for the achievable rate in both cases. Even for transmitting 
classical mixed states, the question of the best achievable rate remains open, in both the 
variant allowing quantum means of compression and that requiring only classical means. 
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A Proof of Theorem 7 



Proposition 6 may be used for part of the proof, but we give a different argument that 
utilizes properties of the Bures-Uhlmann fidelity function throughout. We first establish 
two lemmas which are direct Bures-Uhlmann fidelity analogues of Lemmas 1 and 2 in [18]. 



Lemma Al: Let p and p' be mixed states on TC n with p' supported on a (i-dimensional 
subspace D. Then F(p,p') is less than the sum of the d largest eigenvalues of p, which we 
write as 1 — rj. 

Proof of Lemma A 1 : We use the fact that 

F(p,p') = inf trace pA trace p' A~ l , (24) 
where the infimum is over all strictly positive operators A pg| ]. Choose 

A = { 1 ° n 

1 el on D 1 - (for any e > 0). 

Then 

trace p' A" 1 = 1 , 

and 

trace pA = trace pD + e trace pD ± < 1 — rj + e trace pD 1 - < 1 — rj . 

Hence 

F(PiP') — trace pA trace p'A^ 1 < 1 — rj , 
as required. ■ 

To set the stage for the second lemma, consider a density operator p on 7i n . Denote 
the eigenvalues of p in decreasing order by Aj, i = 1, . . . , n. Let D be the d-dimensional 
subspace spanned by the eigenvectors belonging to the d largest eigenvalues of p ; denote 
the sum of these d eigenvalues by 1 — rj. Denote the projector onto D by II, and let |0) be 
any pure state in D. Now consider the density operator 

p' = UpU + V \0}(0\ . 

This density operator can be obtained from p by first applying the binary measurement 
that projects onto D (outcome "1") or onto D 1 - (outcome "0") and then, if the outcome is 
0, substituting |0)(0| for the post-measurement state. With these preliminaries, the lemma 
can be stated as follows. 

Lemma A2: F(p,p') >l-2rj. 

Proof of Lemma A 2: If we write p in its orthonormal eigenbasis, 



p = ^ Aj \ei)(ei 
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p' becomes 

d 

P' = ^2^i \ e i)( e i\ + V 
i=l 

Introduce the following purifications of p and p': 

n 

\<t>) =X)v / ^|ei)(8)|/i) , 
i=i 



') = Y,^\ e i)®\f*) + Vv\o)®\g) ■ 



i=i 



Here the vectors \fi), i = 1, . . . , d are orthonormal, and \g) is orthogonal to each Since 
fidelity is the maximum absolute value of the inner product of purifications, we have 

F(p, p') > = (E Aij = (1 - V? > 1 - 2r? , 

as required. ■ 

Proof of Theorem 7: Suppose that we compress to S(p) — 5 qubits/signal by any coding 
method whatsoever. Then if the decoding scheme is unitary, the decoded state ajy of an 
input string o~n of length N is supported in N(S{p) — 5) qubits. Yet the density matrix for 
strings of length N is p® N , and by a standard typical sequences result (cf. |jl8|l), the sum 
of the 2 N ( S (^~^ largest eigenvalues of p® N becomes arbitrarily small with increasing N. 
Hence, by Lemma Al, F{un,o^n) is arbitarily small, too, and the fidelity cannot be high 
by the (GLOBAL-FID) criterion. ■ 

On the other hand, if S{p) + 5 qubits/signal are available, then Lemma A2 provides 
an explicit high-fidelity coding scheme, with D being the 2 N ^ S ^ + ^ -dimensional subspace 
spanned by the 2 N ^ S ^ + ^ weightiest eigenvectors of p® N . 



B Proof of Lemma 16 

The proof uses the following inequality (proved in §): 

\S( Pl ) - S(p 2 )\ < 2y/l - F( Pl ,p 2 ) logd + 1 , (25) 

which is valid if 

2 v /l-F( /9l , P2 ) < ~ . (26) 

We use this to obtain a similar relation, but with the average ensemble fidelity in place of 
the fidelity on the right hand side. Let A = {pi,Pi} and B = {pi,ai} again denote two 
mixed-state ensembles having the same probabilities. Letting p = J2iPiPi an d a = J2iPi a i> 
we have an inequality involving the error measure yl — F(p, a). We need to convert this 
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into one involving the error measure 1 — F(A,B). Denning yet another error measure 
8=1 — y/F(p, a), simple algebra gives yl — F(p, a) = \/5(2 — 5). The double concavity 
of G(pi,p2) = y/F{pi,p2) (proved in Appendix gives 



^2piG(pi,ai) < G(J2piPi,^2picri) = G{p,a) . (27) 

i i i 

Hence 

8 = 1-G(p,a) < l-J2PiG(pi,ai) 

i 

< l-J2PiF(pi,(Ti) = l-F(A,B) = e. (28) 

i 

Therefore, we have the inequality 



\S(p)-S(a)\ < 2J<$(2-«S)logd + l 



< 2Je{2 - e)\ogd + 1 



< 2V2J1 - F{A, B) log d+ 1 . (29) 



This inequality is valid provided (26) holds, which is certainly true if 



< e < 1 - J35/36 F(A, B) > J35/36 . (30) 



Furthermore, we can also use the inequality fl25|) to bound the difference in the average 
entropies for the two ensembles of d-dimensional states, 

\^2PiS(pi) - ^PiS(ai) 

i i 



< I> [2jl-F(p i ,a i )logd+l 



< 2 l-Y,PiF(p l ,<J l )logd+l 



= 2yJl-F(A,B)logd+l. (31) 
Combining Eqs. (p9|) and (31) yields the desired result (|l~7|) . ■ 



C Double concavity of G(pi, P2) 

In this Appendix we show that 



G{px,p-z) = jF(p 1 ,p 2 ) = trace J ^/pl_P2 \[pi (32) 
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is doubly concave, i.e., 

G(X P1 + (1 - X)a x , Xp 2 + (1 - A)<r 2 ) > XG( Pl , p 2 ) + (1 - X)G{a l ,a 2 ) . (33) 



The proof uses a representation of the quantum fidelity in terms of measurement prob- 
abilities. Given a measurement described by a positive-operator-valued measure (POVM) 
with POVM elements Ei, the probability for outcome i is pi = trace pEi. Fuchs and Caves 
p9| showed that the quantum fidelity of p± and p 2 is the classical fidelity of the measurement 
probabilities for the measurement that, according to the classical fidelity, best distinguishes 
the two density operators, i.e., 

F( Pl ,p 2 ) = minF cl (pi,p 2 ) . (34) 

{ E i} 

Here the minimum is taken over all POVMs {Ei}, and pi and p 2 are the column vectors 
of measurement probabilities for p\ and p 2 generated by the POVM {Ei}. 

The proof begins by noting that for four positive real numbers, 

< {^fxry 2 ~ ~ V x 2Vi) 2 = xiy 2 + x 2 yi - 2^/x 1 x 2 y 1 y 2 , 
from which it follows that the function ^Jx\x 2 is doubly concave, i.e., 



[Xxt + (1 - X) yi ][Xx 2 + (1 - X)y 2 ) = JX 2 x lX2 + (1 - X) 2 yi y 2 + A(l - X){x x y 2 + x 2 yi) 



> y X 2 xix 2 + (1 - X) 2 yiy 2 + 2A(1 - X)^/xix 2 yiy 2 
= Xy/xix 2 + (1 - X)^yiy 2 . 



The square root of the classical fidelity, 



<2ci(p,q) = v-^ci(p,q) = , (35) 



1=1 



being a sum of such functions, is thus also doubly concave: 

G c i(A Pl + (l-A)qi,Ap 2 + (l-A)q 2 ) > AG cl (pi,p 2 ) + (l-A)G cl (qi,q 2 ) . (36) 



Now use the representation (34), written in terms of square roots of fidelities, to show the 
double concavity of G(pi,p 2 ): 

G(Xp 1 + (l-X)a u Xp 2 + (l-X)a 2 ) = min G c1 (a P i + (1 - A)qi, Ap 2 + (1 - A)q 2 ) 

> min (AG c i(pi,p 2 ) + (1 - A)G c i(qi, q 2 ) 

> min AG c i(pi,p 2 ) + min(l - A)G c i(q 1 ,q 2 ) 

\ E j} { F i} 

= XG(p 1 ,p 2 ) + (l-X)G(a 1 ,a 2 ) . (37) 



(Another proof, by M. A. Nielsen [ 30 1 , uses the relation of quantum fidelity to purifica- 
tions.) 
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